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Abstract 

Various methods to control the influence of a covariate on a response variable are compared. In particular, 
ANOVA with or without homogeneity of variances (HOV) of errors and Kruskal-Wallis (K-W) tests on 
covariate-adjusted residuals and analysis of covariance (ANCOVA) are compared. Covariate-adjusted 
residuals are obtained from the overall regression line fit to the entire data set ignoring the treatment levels 
or factors. The underlying assumptions for ANCOVA and methods on covariate-adjusted residuals are 
determined and the methods are compared only when both methods are appropriate. It is demonstrated 
that the methods on covariate-adjusted residuals are only appropriate in removing the covariate influence 
when the treatment-specific lines are parallel and treatment-specific covariate means are equal. Empirical 
size and power performance of the methods are compared by extensive Monte Carlo simulations. We 
manipulated the conditions such as assumptions of normality and HOV, sample size, and clustering of the 
covariates. The parametric methods (i.e., ANOVA with or without HOV on covariate-adjusted residuals 
and ANCOVA) exhibited similar size and power when error terms have symmetric distributions with 
variances having the same functional form for each treatment, and covariates have uniform distributions 
within the same interval for each treatment. For large samples, it is shown that the parametric methods will 
give similar results if sample covariate means for all treatments are similar. In such cases, parametric tests 
have higher power compared to the nonparametric K-W test on covariate-adjusted residuals. When error 
terms have asymmetric distributions or have variances that are heterogeneous with different functional 
forms for each treatment, ANCOVA and analysis of covariate-adjusted residuals are liberal with K-W test 
having higher power than the parametric tests. The methods on covariate-adjusted residuals are severely 
affected by the clustering of the covariates relative to the treatment factors, when covariate means are 
very different for treatments. For data clusters, ANCOVA method exhibits the appropriate level. However 
such a clustering might suggest dependence between the covariates and the treatment factors, so makes 
ANCOVA less reliable as well. Guidelines on which method to use for various cases are also provided. 

Keywords: allometry; ANOVA; clustering; homogeneity of variances; isometry; Kruskal-Wallis test; linear 
models; parallel lines model 



1 Introduction 



In an experiment, the response variable may depend on the treatment factors and quite often on some external 
factor that has a strong influence on the response variable. If such external factors are qualitative or discrete, 
then blocking can be performed to remove their influence. However, if the external factors are quant itative 
and c ontinuous, the effect of the external factor ca n be accoun t ed for by adopting it a s a co variate (jKuehl 
which is also called a concomitant variable (|Ottl (jl993[ ). iMilliken and JohnsonI (|2002l )). Throughout 
this article, a covariate is defined to be a variable that may affect the relationship between the response 
variable and factors (or treatments) of interest, but is not of primary interest itself. iMaxwell et al.l (|l984l ) 
compared methods of incorporating a covariate into an experimental design and showed that it is not correct 
to consider the correlation between the dependent variable and covariate in choosing the best technique. 
Instead, they recommend considering whether scores on the covariate are available for all subjects prior to 
assigning any subject to treatment conditions and whether the relationship of the dependent variable and 
covariate is linear. 

In various disciplines such as ecology, biology, medicine, etc. the goal is comparison of a response variable 
among several treatments after the influence of the covariate is removed. Different techniques are used or 
suggeste d in statis t ical a nd biological literature to remove the influence of the covariate(s) on the response 
variable ( Huitema For example in ecology, one might want to compare richness-a rea relationships 

among regions, shoot ratios of plants among several treatments, and of C:N ratios among sites ([Garcia-Berthou 
( 20011 )). There are three main statistical techniques for attaining that goal: (i) analysis of the ratio of response 
to the covariate; (ii) analysis of the residuals from the regression of the response with the covariate; and (iii) 
analysis of covariance (ANCOVA). 

Analysis of t he ratios is per h aps t he oldest method used to remove the covariate effect (e.g., size effect 
in biology) (see Albrecht et al.l (Il993l) fo r a comprehens i ve rev iew). Although many authors recommend 
its disu se (IPackard and Boardman 7l988l) . lAtchlev et all IHzi)), it might still appear in literature on oc- 
casion (lAlbrecht et all (Il993l )). " For instance, in physiological and nutrition research, the data are scaled 
by taking the ratio of the response variable to the covariate. Using the ratios in removing the influence 
of the covariate on the response d e pends on the relationship between the response and the covariate vari- 
ables (jRaubenheimer and Simpsonl (|l992l )). Regression analysis of a response variable on the covariat e(s) is 
comm on to detect such relationships, which are categorized as isometric or allometric relationships (jSmall 
( 1996[ )). Isometry occurs when the relationship between a response variable and the covariate is linear with 
a zero intercept. If the relationship is nonlinear or if there is a non-zero intercept, it is called allometry. In 
allometry, the influence of the covariate cannot be removed by taking the ratio of the response to the covari- 
ate. In both of allometry and isometry cases, ANOVA on ratios (i.e., response/ covariate values) introduces 
heterogeneity of variances in the error terms which violates an assumption of ANOVA (with homogeneity of 
variances (HOV)). Hence, ANOVA on ratios may gi ve spurious and invalid results for tr eatme nt comparisons , 
so ANCOVA is recommended over the use of ratios (jRaubenheimer and Simpson (Il992h ). See lCevhanI (l200Gl) 
for a detailed discussion on the use of ratios to remove the covariate influence. 

An alternative method to remove the effect of a covar i ate on the response variable in biological and 
ecological research is the use of residuals (jGarcia-Berthou (I2OOII) ). In this method an overall regressio n 
line is fitted to the entire data set and residuals are obtained from this line (jBeaupre and Duvall (|l998l) ). 
These residuals will be ref erred to as co v ariate - adjusted residuals, henceforth. This method was recommended 
in ecological literature bv I Jakob et al. ( 1996h who called it "residual index" method. Then treatments are 
compared with ANOVA with HOV on these residuals. 

Due to the problems associated with the use of ratios in removing the influence of the covariate from the 
response, AN OVA (with HOV) on co v ariate -adju sted residuals and A NCOVA were re commended over the 
use of ratios ( Packard and BoardmanI ( 19881) and lAtchlev et al. For example , iBeaupre and Duvalll 

(1991) used ANOVA on covariate-adjusted residuals in a zoological study. ICevhanI (|2000l ) compared the 
ANCOVA and ANOVA (with HOV) on covariate-adjusted residu als. ANCOVA has bee n widely applied in 
ecology and it was shown to be a superior alternative to ratios bv ICarcia-Berthoii (2001) who also point out 
problems with the residual index and recommends ANCOVA as the correct alternative. They also discuss 
the differences between ANCOVA and ANOVA on the residual index. They argue that the residual analysis 
is totally misleading as (i) residuals are obtained from an overall regression on the pooled data, (ii) the 
residual analysis uses the wrong degrees of freedom in inference, and (iii) residuals fail to satisfy the ANOVA 



assumptions even if the original data did satisfy them. In fact, 
inappropriateness of ANOVA on residuals. 



Maxwell et al 



also demonstrated the 



Although ANCOVA is a well-established and highly recommended tool, it also has critics. However, the 
main proble m in literature is not the inappropr iateness of ANCOVA, rather its misuse and misinterpretation. 
For example, iRheinheimer and Penfieldl (j200ir ) investigated how the empirical size and power performances 
of ANCOVA are affected when the assumptions of normality and HOV, sample size, number of treatment 
groups, and strength of the covariate-dependent variable relationship are manipulated. They demonstrated 
that for balanced designs, the ANCOVA F test was robust and was often the most powerful test through 
all sample-size designs and distributional configurations. Otherwise it was not the best performer. In fact, 
the assumptions for ANCOVA are crucial for its use; especially, the independence between the covariate 
and th e treatment factors is an often ignored assumption resulting incorrect inferences (jMiller and Chapman 
This violation is very common i n fields such as psychology a nd psychiatry, due to nonrandom group 
assignment in observational studies, and iMiller and Chapma also suggest some alternatives for such 

cases. Hence the recommendations in favor on ANCOVA (including ours) are valid only when the underlying 
assumptions are met. 

In this article, we demonstrate that it is not always wrong to use the residuals. We also discuss the 
differences between ANCOVA and analysis of residuals, provide when and under what conditions the two 
procedures are appropriate and comparable. Then under such conditions, we not only consider ANOVA (with 
HOV), but also ANOVA without HOV and Kruskal-Wallis (K-W) test on the covariate-adjusted residuals. 
We provide the empirical size performance of each method under the null case and the empirical power under 
various alternatives using extensive Monte Carlo simulations. 

The nonparametric analysis by K-W test on the covariate-adjusted residuals is actually not entirely non- 
parametric, in the sense that, the residuals are obtained from a fully parametric model. However, when the 
covariate is not continuous b ut categorical data w ith o rdinal levels, then a no n param etric version of ANCOVA 



can be performed (see, e.g. . lAkritas et al.l (1200 



metric ANCOVA model of Akritas et al.l (200i 



and Tsangari and Akritaj (|2004ah ). Further, the nonpara- 

- _ ^_ I - ■ ■ is extended to longitudinal data for up to three covariates 

( Tsangari and Akritad (|2004bl )V Additionally, there are nonparametric methods such as Quade's procedure, 
Puri and Sen's solution, Burnett and Barr's rank difference scores, Conover and Iman's rank transformation 
test, Hettmans perger's procedure, and the Puri- Sen-Harwell-Serlin test which can be used as alternatives to 



spe 

ANCOVA fsee IRheinheimer and Penfieldl (I2OOID for the comp arison of the these tests with ANCOVA and 



relevant references). In fact, IRheinheimer and Penfieldl (|200ll ) showed that with unbalanced designs, with 
variance heterogeneity, and when the largest treatment-group variance was matched with the largest group 
sample size, these nonparametric alternatives generally outperformed the ANCOVA test. 

The methods to remove covariate influence on the response are presented in Section|2l where the ANCOVA 
method, ANOVA with HOV and without HOV on covariate adjusted residuals, and K-W test on covariate- 
adjusted residuals are described. A detailed comparison of the methods, in terms of the null hypotheses, and 
conditions under which they are equivalent are provided in Section |3l The Monte Carlo simulation analysis 
used for the comparison of the methods in terms of empirical size and power is provided in Section 2) A 
discussion together with a detailed guideline on the use of the discussed methods is provided in Section |5l 



2 ANCOVA and Methods on Covariate- Adjusted Residuals 

In this section, the models and the corresponding assumptions for ANCOVA and the methods on covariate- 
adjusted residuals are provided. 



2.1 ANCOVA Method 

For convenience, only ANCOVA with a one-way treatment structure in a completely randomized design and 
a single covariate is investigated. A simple linear relationship between the covariate and the response for each 
treatment level is assumed. 

Suppose there are t levels of a treatment factor, with each level having Si observations; and there are 



replicates for each covariate value for treatment level i for i = 1, 2, . . . , t and j = 1, 2, . . . , where rii is the 
number of distinct covariate values at treatment level i. Let n be the total number of observations in the 
entire data set then Si ~ Ejii ^ij ~ E*=i ^i- ANCOVA fits a straight line to each treatment level. 

These lines can be modeled as 

Yi]k = + A + e^jk (1) 

where Xy is the j*'* value of the covariate for treatment level i, Yijk is the A:*'' response at Xij, /i,; is the 
intercept and (3i is the slope for treatment level i, and Cijk is the random error term for i = l,2,...,t, 
j = l,2,...,ni, and k = 1,2, ... ,rij . The assumptions for the ANCOVA model in Equation ^ arc: (a) 
The Xij (covariate) values are assumed to be fixed as in regression analysis (i.e., Xij is not a random 

variable), (b) ajk N (O, (Tg) for all treatments where stands for "independently identically distributed 
as". This implies Yijk are independent of each other and Yijk ^ X (^fii + pi Xij ,a1Y (c) The covariate and 
the treatment factors are independent. Then the straight line fitted by ANCOVA to each treatment can be 
written as Yij = Jii + (3i Xij , where Yij is the predicted response for treatment i at Xij , Jii is the estimated 
intercept, and f3i is the estimated slope for treatment i. 

In the analysis, these fitted lines can then be used to test the following null hypotheses: 

(i) Ho : /?! = /32 = ■ • ■ = A = (All slopes are equal to zero). 

If Ho is not rejected, then the covariate is not necessary in the model. Then a regular one-way ANOVA can 
be performed to test the equality of treatment means. 

(ii) Ho ■ pi ~ p2 — ■ ■ ■ = Pt (The slopes are equal) . 

Depending on the conclusion reached here, two types of models are possible for linear ANCOVA models - 
parallel lines and no npar allcl lines models. I f Hp in (ii) is not rejected, then the lines arc parallel, otherwise 



they arc nonparallel ( Milliken and JohnsonI ( 2002f )). Throughout the article the terms "parallel lines models 



(case)" and "equal slope models (case)" will be used interchangeably. The same holds for "nonparallel lines 
models (case)" and "unequal slopes models (case)" . 

The parallel lines model is given by 

Yi^jk = Hi + PXij + Cijk, (2) 

where f3 is the common slope for all treatment levels. With this model, testing the equality of the intercepts, 
Ho : fii — fi2 — ■ ■ ■ = tJ-t, is equivalent to testing the equality of treatment means at any value of the covariate. 
For the nonparallel lines case, the model is as in Equation ([1]) with at least one Pi being different for some 
i = 1,2, . . . ,t. So the comparison of treatments may give different results at different values of the covariate. 

2.2 Analysis of Covariate- Adjusted Residuals 

First an overall regression line is fitted to the entire data set as: 

Yij + p* Xij , for i = 1, 2, . . . , t and j — 1,2, ... ,ni, (3) 

where /i is the estimated overall intercept and /?* is the estimated overall slope. The residuals from this 
regression line are called covariate- adjusted residuals and are calculated as: 



Rijk = Yijk - Y^j = Yijk - M - /?* Xy, for i ^ 1,2, ... ,t, j = 1,2, ... ,ni, and /c = 1, 2, . . . , r^, (4) 
lijk is the fc*'' residual of treatment level i at Xy . 



2.2.1 ANOVA with or without HOY on Covariate- Adjusted Residuals 



In ANOVA with or without HOV procedures, the covariate-adjusted residuals in Equation ([4]) are taken to 
be the response values, and tests of equal treatment means are performed on residual means. 



The means model and assumptions for the one-way ANOVA with HOV on these covariate-adjusted resid- 
uals are: 

Ri]k = Pi + £ijk, for i = 1,2, . . . ,t, j = 1,2, . . . and fc = 1,2, . . .,rij, (5) 

where pi is the mean residual for treatment i, Sijk are the (independent) random errors such that Sijk ~ 
N(0,cr^). Notice the common variance for all treatment levels. However, Rijk are not independent of 
each other, since X]i=i X]j=i ^ijk = 0, which also implies that the overall mean of the residuals is zero. 

Hence the model in Equation ([3]) and an effects model for residuals are identically parameterized. 

For the nonparallel lines model in Equation ([1]), the residuals in Equation ^ will take the form: 
Rijk = Yijk - % = + 13^ Xij + Ci-jk - (p + P* Xij^ = (/i^ - /i) + (^Pi - Xij + Cijk. 
Hence, the influence of the covariate will be removed if and only if 

(3* = (3i for all i = l,2,...,t. (6) 

Then taking covariate-adjusted residuals can only remove the influence of the covariate when the treatment- 
specific lines in Equation H]) and the overall regression in Equation ([3]) are parallel. Notice that the residuals 
from the treatment-specific models in Equation ([T]) cannot be used as res ponse values in an ANOVA with 



HOV, because treatment sums of squares of such residuals are zero ( Cevhan (.200(1 )) 



In ANOVA without HOV on covariate-adjusted residuals, the only difference from ANOVA with HOV 
is that £ijk are the (independent) random errors such that £y ^ ~ N (O, (t|) . Notice the treatment-specific 
variance term af; i.e., the homogeneity of the variances is not necessarily assumed in this model. 

Kruskal-Wallis (K-W) test is an extension of the Mann- Whitne y U test to three or mo r e grou ps; and 
for two groups K-W test and Mann- Whitney U test are equivalent ( Siegel and Castellan Jr.l ( 19881 )). K-W 



test on the covariate-adjusted residuals which are obtained as in model (|3|) tests the equality of the residual 
distributions for all treatment levels. Notice that contrary to the parametric models and tests in previous 
sections, only the distributional equality is assumed, neither normality nor HOV. 



3 Comparison of the Methods 

ANOVA with or without HOV or K-W test on covariate-adjusted residuals and ANCOVA can be compared 
when the treatment-specific lines and the overall regression line are parallel. The null hypotheses tested by 
"ANCOVA" , "ANOVA with or without HOV" , and "K-W test" on covariate-adjusted residuals are 

Ho ■ pi ^ P2 = ■ ■ ■ = Pt (Intercepts are equal for all treatments.) (7) 



Ho '■ Pi = P2 = ■ ■ ■ = Pt (Residual means are equal for all treatments.) (8) 

and 

Ho : Fr-^ = Fr^ = . • • = Fr^ (Residuals have the same distribution for all treatments.), (9) 
respectively. 

For more than two treatments the assumption of parallelism is less likely to hold, since only two lines with 
different slopes are sufficient to violate the condition. With two treatments, the null hypotheses tested by 
ANCOVA, ANOVA with or without HOV and K-W test on covariate-adjusted residuals will be 

Ho : pi = P2 (or ^2 = 0) (10) 



and 



Ho ■■ pi= P2 (or pi- p2= 0) 
Ho ■ Fr^ — Fr^ (or i?i — R2) 



(11) 
(12) 



respectively, where = stands for "equal in distribution" . 

In Equation pip , pi can be estimated by the sample residual mean, i?;... Combining the expressions in 
([4]) and ([5]), the residuals can be rewritten as 



Rijk ~ Pi + £ijk — Yijk — Yij — {fii + Pi Xij + Cijk) — yi-i + P* Xij J , 

i = 1, 2, J = 1, 2, . . . , Hi, and k ~ 1,2^ ... ,rij . Averaging the residuals for treatment i yields 

R^..-=Pi+e,..=^l,+|3,Xi,+e^..-fl~-^3*Xi,, i = 1,2 (13) 

where Xi, is the sample mean of covariatc values for treatment i, e^.. = ^yfe/'^j ^'^^ ~ 

Ejli J2k=i Sijk/rii, i = 1, 2. Under the assumptions of ANCOVA and ANOVA (with or without HOV) on 
covariate-adjusted residuals, taking the expectations in (|13p yields 

^[R^..]=P^^^i^+P^X^.-^i-P*X^.^^l^-^i+iP^ -f3*)X,., i=l,2, (14) 

since E[ei..] = and E[ei..] = 0, for i = 1,2. Hence Ho in ([TT|) can be rewritten as Ho : (/ii — P2) + 
(/3i — P*) Xi, — {P2 — /?*) X2. = 0. Then the hypotheses in Equations pO|) and pT|) are equivalent iff 

{Pl-P*)Xl. = {P2-P*)X2. (15) 

Using condition ([6]) and repeating the above argument for all pairs of treatments, the condition in (fT5|) can 
be extended to more than two treatments. 

Notice that the conditions that will imply (jTS]) will also imply the equivalence of the hypotheses in (|10p 
and pT|) . The overall regression slope can be estimated as 

_ J2i=i I]fc=i i^ij " ^ i^^jk ~ Y...) _ Y^i^i Z]"4i J^k^i {^^i ~ X,)Yijk 

where X.. is the overall covariate mean, y... is the overall response mean, and 

2 rii 2 Ui 

= E E '^^^ (^^^ - ^■•)' = E E (^'^ - ^•■) 

1 j—l i—l j — 1 

Furthermore the treatment-specific slope used in model ([1]) is estimated as 

Sjil Y.k=l i-^ij ~ ^i-) (^yfe ^ ^i--) 



Exx.', 



where Exx,i = ^^i (^^i — AT^.) , and y^.. is the mean response for treatment i. Substituting Yijk = 

fli + Pi Xij + R'iji^, i = 1, 2, J = 1, 2, . . . , Tii, and fc = 1, 2, . . . , in Equation (|16p where ju^ is the estimated 

^ Ell ei:^! (^.. - X.) (/I. + A + i?^,. 

becomes /?* = -:; . With some rearrangements, we get 

E* 

^xx 



intercept for treatment level i, and R'^^. is the k^^ residual at Xij in model ([T|), the estimated overall slope 



/3* = A + 



since Ei=i Ejli El=i -^■■R'ijk ^ ^- As E i?-^^, = 0, taking the expectations of both sides of ((TT)) yields 

_ ^ , Ml (E7^i (Xi, ^T, ))+M2 (E (^2, -X.. )) 

_ _ _ 

_ a , Ml»l(-yi.-^.-)+M2"2(j^2.-^..) 
— Pi+ E* 



Under Ho : ^^i = fJ.2, (UHl) reduces to (3* = iS 



M {X,,-X„)+n2 (X^.-X,,) 
E* 







(19) 



provided that E*,-,. ^ 0. Indeed, i?*^ = will hold if and only if all Xy are equal to a constant for each treat- 
ment «, in which case, /3* and fii will both be undefined. The condition in (fTOl) holds if Xi. = X2. [=X Re- 
call that Ho '■ P\= 92 was shown to be equivalent to Ho : /.ti = M2 provided that (/3i — /?*) X\, = {ji-i — P*) X2. , 
which holds if Xi, = X2. and (3i = (32- So the null hypotheses in (|10|1 and (fTTj) arc equivalent when the 
treatment-specific lines are parallel and treatment-specific means are equal which implies the condition stated 
in ©. 

In general for t treatments, the hypotheses in ([7]) and ^ can be tested using an F test statistics. Ho in 
([7]) can be tested by 

^_ MSTrt 

where MSTrt is the mean square treatment for response values, and MSE is the mean square error for 
response values. These mean square terms can be calculated as: 



MSTrt = 



Y 



|3^{X, 



X 



and 



MSE = 



i=l 2^7 = 1 2^ 



k=l 



(t-l) 

{Yijk ~ Y i,^ — Pi [Xij — Xi.) 



(n-it + l)) 

Note that M SE has {n — t — 1) degrees of freedom {df) since there are (^ + 1) parameters {fii for i = 1,2, ... ,t 
and /3) to estimate. Therefore the test statistic in ([20)) is distributed a.s F ^ F {t — l,n — t — 1). 

Similarly, Hq in © can be tested by 

F* 



MSTrt* 



MSE* ' ^^^^ 
where MSTrt* is the mean square treatment for covariate-adjusted residuals, and MSE* is the mean 
square error for covariate-adjusted residuals. These mean square terms can be calculated as MSTrt* = 
E"=i nj(-Ri..--R...) MSE* ~ SZLi E"=i Efc'ii (flij>c--R...) 



(t-i) _ 
1, 2, . . . , t, and R, 



("-*) 



Using Ri,, = y,;.. - ^ - (i* Xi,, i 



Y 



Jl-p*X, 



MSTrt* 



r^*.. - Y. 



f3* {X, 



X 



(t-l) 



and 



MSE* 



Ell jy^ii {y^,k y,..) p* (x,, - x,.) 



(n - t) 

It might seem that MSE* has (n — t) degrees of freedom {df), since there are t parameters (/?,; for i = 
1, 2, . . . , t) to estimate, so the test statistic in Equation (PT|) is distributed as F* ^ F [t — l,n — t). How- 
ever, there is one more restriction in test PT|). Since y!,-_] ^^-i. ^f^-i -'^ijfc = 0; then i^* should actually 
be distributed as J"* ~ f (t — 1, n — t — 1). lAtchlev et al. (197a) did not suggest this adjustment in df, 
and iBeaupre and Duvall (Il998l) used the method without such an adjustment. That is, in both sources 
F {t — l,n — t) is used for inference. So, in this article df for M SE* has been set at (71 — t) as in literature. 

Notice that, the i^-statistics in ([20)) and (PT|) become 



F = 



ELi E;4i [{Y,.. -Y...)-p (X,. - X..) J /{t-l) 

Ell E;=i El-ii [(5^..^. -Y,..)-p (X,, - x,.)l V(" - + 1)) 



(22) 



and 



X, 



(23) 



{n-t) 



respectively. For two treatments, t ~ 2 will be used in Equations ([22|) and ([23l) . then the test statistics will 
be distributed as F ^ F (1, n — 3) and F* ^ F (1, n — 2), and they can be used to test the hypotheses in 

Equations (fTO|) and (fTTj) . respectively. Furthermore, with two treatments, note that F = (rt — 3) and 
F* = {n — 2) and (n) is the i-distribution with n df. As n ^ oo, both F and F* will converge in 
distribution to Xi- So F and F* will have similar observed significance levels (i.e., values) and similar 
scores for large n. Similar decisions for testing (fTO| and (fTTj) will be reached if the calculated test statistics 
are similar; i.e., F « F* for large n. Likewise, F in ((20)) and F* in ([2T|) will have similar distributions for 
large n. 

For the case of two treatments, comparing F and F*, it can be seen that F and F* are similar if (3* w (3i 
for large n. The same argument holds for the test statistics in the general case of more than two treatments 
for large n. The test statistics will lead to similar decisions, if (3* w Pi as n increases. That is, the overall 
regression line fitted to the entire data set should be approximately parallel to the fitted treatment-specific 
regression lines for the test statistics F and F* to be similar. If [3* ~ f3i, then ANOVA with or without HOV 
on covariate-adjusted residuals and ANCOVA will give similar results. Consequently, it is expected that 
the ANCOVA and ANOVA with HOV or without HOV methods give similar results as treatment-specific 
covariate means gets closer for the parallel lines case. 

The above discussion is based on normality of error terms wi t h HO V. Without HOV the rf/of the F-tests 
are calculated with Satterthwaite approximation ( Kutner et al.l ( 20041 )). On the other hand, K-W test does 
require neither normality nor HOV, but implies a more general hypothesis Hq : Fr^ = Ffj^ , in the sense that 
Ho would imply Ho ■ Pi ~ P2 without the normality assumption. However, the null hypothesis in Equation 
(jlip implicitly assumes normality. 



4 Monte Carlo Simulation Analysis 

Throughout the simulation only two treatments {t = 2) are used for the comparison of methods. In the 
simulation, sixteen different cases are considered for comparison (see Table [!}. 



4.1 Sample Generation for Null and Alternative Models 

Without loss of generality, the slope in model ([2|) is arbitrarily taken to be 2 and the intercept is chosen to 
be 1. So the response values for the treatments are generated as 

(i) Yijk = 1 + 2Arij + eijk, j ~ 1,2, . . . ,ni and k = 1,2, . . . , rij for treatment 1 (24) 
with eijk Fi, where Fi is the error distribution for treatment 1. 

(ii) >2jfe = (1 + 0.02g) + 2X2^ + e2jk, j = 1,2, ... ,112 and k^l,2,..., r2j for treatment 2 (25) 

with e2jk F2, where F2 is the error distribution for treatment 2 and q is introduced to obtain separation 
between the parallel lines. In ((24)) and (|25)) . Xij is the j*'* generated value of the covariate in treatment i, 
Yijk is the response value for treatment level i at Xij for i = 1,2, Cijk is the k*^ random error term. The 
covariate ranges, sample sizes (ni and 712), error distributions (Fi and F2) for the two treatments, and the 
number of replicates (reps) at each value of Xij are summarized in Table [TJ In the context of model ^ the 
common slope is /3 = 2, and fii = 1 and /i2 = (1 + 0.02 q) are the intercepts for treatment levels 1 and 2, 
respectively. 

Then as q increases the treatment-specific response means become farther apart at each covariate value 
and the power of the tests is expected to increase. The choice of 0.02 for the increments is based on time 



and efficiency of the simulation process, q is incremented from 1 to m„ in case-u, for u = 1, 2, 16 (Table 
[1} where rriu is estimated by the standard errors of the intercepts of the treatment-specific regression lines. 
In the simulation no further values of q are chosen when the power is expected to approach 1.00 that occurs 
when the intercepts are approximately 2.5 standard errors apart, as determined by equating the intercept 
difference, 0.02 q = 2.5 sji. , with q replaced by m„. A pilot sample of size 6000 is generated (g = 0, 1, 2, 3, 4, 5 
with 1000 samples at each q), and maximum of the standard errors of the intercepts is recorded. Then 
m„ = 2.5 maxi(sp. )/0.02 for i = 1,2 in case u. 

All cases labeled with "a" have one replicate and all cases labeled with "b" have two replicates per covariate 
value, henceforth. For example, in case la the most general case is simulated with iid N{0, 1) error variances, 
and 20 uniformly randomly generated covariate values in the interval (0, 10) for both treatments. In case lb, 
the data is generated as in case la with two replicates per covariate value. 

In cases 1, 5-8, 9, and 12-16, error variances are homogeneous; in cases 1, and 5-8 error terms are generated 

as iid iV(0,l). In case 9, error terms are generated as iid (— a/3, a/S) ; in case 12, error terms are iid 

DW (0, 1,3), double- Weibull distribution with location parameter 0, scale parameter 1, and shape parameter 
3 

3 whose pdf is f{x) = -.x^exp (— ja;!'') for all x; in case 13, error terms are iid -\/48(/3 (6,2) — 3/4) where 

f3 (6, 2) is the Beta distribution with shape parameters 6 and 2 whose pdf is f{x) = 42x^(1 — a;)I(0 < a; < 1) 
where I(-) is the indicator function; in case 14, error terms are iidxi"^ where X2 the chi-squarc distribution 
with 2 df, in case 15, error terms are iid LN{0, 1) — e^/^ where LN{0, 1) is the log-normal distribution with 

location parameter and scale parameter 1 whose pdf is f{x) = — cxp [ — (loga::)^ J I{x > 0), and in 



xV^TT V 2 

case 16, error terms are iid N{Q, 2) for treatment 1 and iid X2 ~ 2 for treatment 2. 

In cases 2-4 heterogeneity of variances for normal error terms is introduced either by unequal but constant 
variances (case 2), unequal but a combination of constant and x-dependent variances (case 3), or equal 
and x-dependent variances (case 4). In case 10 error terms are iid U (— a/S, a/S) for treatment 1 and iid 
lA (— 2-\/3, 2V3) for treatment 2; in case 11, error terms arc iidU (— a/S, a/S) treatment 1 and iidU {—\/x, ^/x) 
for treatment 2. 

The choice of constant variances is arbitrary, but the error term distributions for constant variance cases are 
picked so that their variances are roughly between 1 and 6. However, a-depende nce of variances is a r e alistic 



but not a general case, since any function of x could have been used. For example. iBeaupre and Duvalll (|l998f ) 
who explored the differences in metabolism (O2 consumption) of the Western diamondback rattlesnakes with 
respect to their sex, the O2 consumption was measured for males, non-reproductive females, and vitellogenic 
females. To remove the influence of body mass which was deemed as a covariate on O2 consumption, ANOVA 
with HOV on covariate-adjusted residuals was performed. In their study, the variances of O2 consumption 
for sexual groups have a positive correlation with body mass. In this study, ^/x is taken as the variance term 
to simulate such a case. Heterogeneity of variances conditions violate one of the assumptions for ANCOVA 
and ANOVA with HOV on covariate-adjusted residuals, and are simulated in order to evaluate the sensitivity 
of the methods to such violations. The unequal variances in cases 2 and 3 were arbitrarily assigned to the 
treatments since all the other restrictions are the same for treatments at each of these cases. In case 5, 
different sample sizes are taken from that of other cases to see the influence of unequal sample sizes. 

In cases 1-8, error terms are generated from a normal distribution. In cases 9-15, non-normal distributions 
for error generation are employed. In cases 9-12, the distribution of the error variances are symmetric around 
0, while in cases 13-15 the distributions of the error terms are not symmetric around 0. Notice that cases 13-15 
are normalized to have zero mean, and furthermore case 13 is scaled to have unit variance. The influence of 
non-normality and asymmetry of the distributions are investigated in these cases. In case 16, the influence 
of distributional differences (normal vs asymmetric non-normal) in the error term is investigated. 

In cases 1-5 and 9-16, covariates are uniformly randomly generated, without loss of generality, in (0, 10), 
hence Xi, w X2. is expected to hold. In these cases the influence of replications (or magnitude of equal 
sample sizes), heterogeneity of variances, and non- normality of the variances on the methods are investigated. 
Cases 6-8 address the issue of clustering which might result naturally in a data set. Clustering occurs if the 
treatments have distinct or partially overlapping ranges of covariates. Extrapolation occurs if the clusters are 
distinct or the mean of the covariate is not within the covariate clusters for at least one treatment. In case 
6 there is a mild overlap of the covariate clusters for treatments 1 and 2, such that covariates are uniformly 



randomly generated within (0, 6) for treatment 1, and (4, 10) for treatment 2, so Xi, and X2. are expected to 
be different. In fact, this case is expected to contain the largest difference between Xi, and X2.. See Figure 
[T]for a realization of case 6. In case 7 treatment 1 has two clusters, such that each treatment 1 covariate 
is randomly assigned to cither (0,3) or (7, 10) first, then the covariate is uniformly randomly generated in 
that interval. Treatment 2 covariates are generated uniformly within the interval of (4,10). Note that Xi 
and X2. are expected to be very difi^erent, but not as much as case 6. See Figure [2] for a realization of 
case 7. Notice that the second cluster of treatment 1 is co mpletely inside the covaria te range of treatment 
2. These choices of clusters arc inspired by the research of iBeaupre and Duvalll l|l998h which deah with O2 
consumption of rattlesnakes. In case 8 treatment 1 has two clusters, each treatment 1 covariate is uniformly 
randomly generated in the randomly selected interval of either (0,4) or (6, 10). Treatment 2 covariates are 
uniformly randomly generated in the interval (3,7). Hence Xi, and X2. are expected to be similar. Notice 
that treatment 2 cluster is in the middle of the treatment 1 clusters with mild overlaps. 



4.2 Monte Carlo Simulation Results 



In this section, the empirical size and power comparisons for the methods discussed are presented. 



4.2.1 Empirical Size Comparisons 



In the simulation process, to estimate the empirical sizes of the methods in question, for each case enumerated 
in Tabic [U Nmc = 10000 samples are generated with q ~ using the relationships in and (pS)) . Out of 
these 10000 samples the number of significant treatment differences detected by the methods is recorded. The 
number of differences detected concurrently by each pair of methods is also recorded. The nominal significance 
level used in all these tests is a = 0.05. Based on these detected differences, empirical sizes are calculated as 
Si = Vil Ni where Vi are number of significant treatment differences detected by method i with method 1 being 
ANCOVA, method 2 being ANOVA with HOV, method 3 being to ANOVA without HOV, and method 4 being 
K-W test on covariate-adjusted residuals. Furthermore the proportion of differences detected concurrently by 
each pair of methods is ctij = Vi_j/N„ic, where Nmc = 10000 and is the number of significant treatment 
differences detected by methods i,j, with i ^ j. For large Nmc, ai^N{ai,a'^.), i = 1,2,3,4, where ~ stands 
for "approximately distributed as" , Ui is the proportion of treatment differences, cr^ . = ai{l — ai)/ Nmc is the 
variance of the unknown proportion, Ui whose estimate is . Using the asymptotic normality of proportions 
for large Nmc^ the 95% confidence intervals arc constructed for empirical sizes of the methods (not presented) 
to see whether they contain the nominal significance level, 0.05 and the 95% confidence interval for the 
difference in the proportions (not presented either) to check whether the sizes are significantly different from 
each other. 

The empirical size estimates in cases la-16a and lb-2b are presented in Tabled Observe that ANCOVA 
method is liberal in case 2a and conservative at cases 14a and 15a, and has the desired nominal level 0.05 
for the other cases. The liberalness in case 2a weakens as the number of replicates is doubled (see case 2b). 
ANOVA with or without HOV are liberal in cases la, 2a, and 3a, and conservative in cases 6a-8a, and 14a-15a 
and have the desired nominal level for the other cases. However, the liberalness of the tests weakens in cases 
the number of replicates is doubled (see cases lb-3b). K-W test is liberal in cases la-3a, 10a, 11a 
and 16a, and conservative in cases 6a, 7a, and 14a, and has the desired nominal level for the other cases. 
Liberalness of the test in case la weakens as the number of replicates is doubled (see case lb). Notice that 
the ANCOVA method has the desired size when the error term is normally distributed or has a symmetric 
distribution, tends to be slightly liberal when HOV is violated, and is conservative when error distribution 
is non-normal and not symmetric. On the other hand, ANOVA with or without HOV have about the same 
size for all cases. Both methods have the desired size when error terms are normally distributed, or have 
symmetric distribution, and the covariates have similar means. When error terms are normal without HOV, 
both methods are liberal with ANOVA without HOV being less liberal. When error terms are non-normal 
with asymmetric distributions, both methods tend to be slightly conservative. But, when the covariate means 
are extremely different, both methods arc extremely conservative (see cases 6 and 7). See Figure [3] for the 
empirical size estimates for ANCOVA and ANOVA with HOV on covariate-adjusted residuals as a function 
of distance between treatment-specific means. As the distance between treatment-specific means increase 
the empirical size for the ANOVA with HOV on covariate-adjusted residuals decreases, while the empirical 



size for ANCOVA is stable about the desired nominal level 0.05. K-W test has the desired level when error 
terms have symmetric and identical distributions, is liberal when errors have the same distribution without 
HOV and different distributions, and is conservative when errors have asymmetric distributions provided the 
covariates have similar means. But when the covariate means are very different, KW test is also extremely 
conservative (see cases 6 and 7). 

Moreover, observe that when the covariates have similar means, ANCOVA and ANOVA (with or without 
HOV) methods have similar empirical sizes. These three methods have similar sizes as K-W test when 
the error distributions have HOV. Without HOV, K-W test has significantly larger empirical size. When 
the covariate means are considerably different, ANCOVA method has significantly larger size than others. 
ANOVA with or without HOV methods have similar empirical sizes for all cases. 

As seen in Table [2J the proportion of agreement between the empirical size estimates are usually not 
significantly different from the minimum of each pair of tests for ANCOVA and ANOVA with or without HOV, 
but the proportion of agreement is usually significantly smaller for the cases in which K-W test is compared 
with others. Therefore, when covariate means are similar, ANCOVA and ANOVA with or without HOV 
have the same null hypothesis, with similar acceptance/rejection regions, while K-W test has a different null 
hypothesis hence different acceptance/rejection regions. When covariate means are different, ANCOVA and 
ANOVA methods have different acceptance/rejection regions, and K-W test has a different null hypothesis. 
Both ANOVA methods have the same null hypothesis, and have similar acceptance/rejection regions for this 
simulation study. 

4.2.2 Empirical Power Comparisons 

The empirical power curves are plotted in Figures 31 [51 [HI and [3 Empirical power corresponds to z = f ,2. 
The value on the horizontal axis is defined to be intercept difference (i.e., 0.02 q) as in p5)) . Then the empirical 
power curves are plotted against the simulated intercept difference values. In these figures the empirical power 
curve for a case labeled with "a" is steeper and approaches to 1.00 faster than that of the case labeled with 
"b" for the same case number, due to the fact that "b" -labeled cases have two replicates with the rest of the 
restrictions identical to the preceding "a" -labeled cases. Only cases labeled with "a" and "b" in case I are 
presented in Figure 21 For other cases, plot for only "a" -labeled case is presented. 

The first intercept difference value at which the power reaches 1 are denoted as k and are provided in 
Table 21 for all cases. Observe also that power curves are steeper when error variances are smaller. The 
empirical power curves are almost identical for all methods in case 13 which has a scaled Beta distribution for 
the error term. That is, in this case the conditions balance out the power estimates for the methods. In cases 
1, 9-11, and 16 the power estimates for ANCOVA and ANOVA methods are similar but all are larger than 
the K-W test power estimates. In these cases, except in cases 11 and 16, the error distributions are identical 
for both treatment levels, and are all symmetric; furthermore, uniform distribution approaching asymptotic 
normality considerably fast satisfies all the assumptions of the parametric tests. In cases 3, 4, 14, and 15 
power estimates for ANCOVA and ANOVA methods are similar but all are smaller than the K-W test power 
estimates. In these cases, either HOV is violated as in cases 3 and 4, or normality is violated as in cases 
14 and 15 with the error distribution being asymmetric. Since K-W test is non-parametric, it is robust to 
non-normality, and since it tests distributional equality, it is more sensitive to HOV in normal cases. In case 
5, power estimates of ANCOVA and ANOVA with HOV arc similar, with both being larger than that of 
ANOVA without HOV whose power estimate is larger than that of K-W test. In this case, the sample sizes 
for the treatments are different with everything else being same. In cases 6-8, the power estimate of ANCOVA 
method is significantly larger than those of the ANOVA methods whose empirical sizes are larger than that of 
K-W test. In these cases, the covariates are clustered with very different treatment-specific means in cases 6 
and 7, and similar means in case 8. In cases 2 and 12, for smaller values of intercept difference (i.e., between 
to 0.5 in case 2 and to 0.8 in case 12), ANCOVA and ANOVA methods have similar power with all having 
a smaller power than that of K-W test, while for larger values of the intercept difference (i.e., between 0.5 to 
4 in case 2 and 0.8 to 2 in case 12), the order is reversed for the power estimates. In case 2, error terms have 
different but constant variances, and in case 12, error terms are non-normal but symmetric. 



5 Discussion and Conclusions 



In this article, wc discuss various methods to remove the covariate influence on a response variable when testing 
for differences between treatment levels. The methods considered are the usual ANCOVA method and the 
analysis of covariate-adjusted residuals using ANOVA with or without homogeneity of variances (HOV) and 
Kruskal-Wallis (K-W) test. The covariate-adjusted residuals are obtained from the fitted overall regression 
line to the entire data set (ignoring the treatment levels). For covariate-adjusted residuals to be appropriate 
for removing the covariate influence, the treatment-specific lines and the overall regression line should be 
parallel. On the other hand, ANCOVA can be used to test the equality of treatment means at specific values 
of th e covariate. Fu r therm ore, the use of ANCOVA is extended to the nonparallel treatment-specific lines 
also (|Kowalski et all (|l994l )V 



The Monte Carlo simulations indicate that when the covariates have similar means and have similar dis- 
tributions (with or without HOV), ANCOVA, ANOVA with or without HOV methods have similar empirical 
sizes; and K-W test is sensitive to distributional differences, since the null hypotheses for the first three 
tests are about same while it is more general for K-W test. When the treatment-specific lines are parallel, 
treatment-specific covariate ranges and covariate distributions are similar. ANCOVA and ANOVA with or 
without HOV on covariate-adjusted residuals give similar results if error variances have symmetric distribu- 
tions with or without HOV and sample sizes are similar for treatments; give similar results if error variances 
are homogeneous and sample sizes are different but large for treatments. In these situations, parametric tests 
are more powerful than K-W test. The methods give similar results but are liberal if error variances are 
heterogeneous with different functional forms for treatments. In these cases, usually K-W test has better 
performance. 

When the treatment-specific lines are parallel, but treatment-specific covariate ranges are different; i.e., 
there exist clustering of the covariate relative to the treatment factors, ANCOVA and ANOVA on covariate- 
adjusted residuals yield similar results if treatment-specific covariate means arc similar, very different results 
if treatment-specific covariate means are different since overall regression line will not be parallel to the 
treatment-specific lines. In such a case, methods on covariate-adjusted residuals tend to be extremely con- 
servative whereas the size of ANCOVA F test is about the desired nominal level. Moreover, ANCOVA is 
much more powerful than ANOVA on covariate-adjusted residuals in these cases. The power of ANOVA on 
covariate-adjusted residuals gets closer to that of ANCOVA, as the difference between the treatment-specific 
covariate means gets smaller. However, in the case of clustering of covariates relative to the treatments, one 
should also exercise extra caution due to the extrapolation problem. Moreover in practice, such clustering is 
suggestive of an ignored grouping factor as in blocking. The discussed methods are meaningful only within 
the overlap of the clusters or in the close vicinity of them. However, when there are clusters for the groups in 
terms of the covariate, it is very likely that covariate and the group factors are dependent, which violates an 
assumption for ANCOVA. When this dependence is strong then ANCOVA method will not be appropriate. 
On the other hand, the residual analysis is extremely conservative which might be viewed as an advantage in 
order not to reach spurious and confounded conclusions in such a case. 

The ANCOVA models can be used to estimate the treatment-specific response means at specific values of 
the covariate. But the ANOVA model on covariate-adjusted residuals should be used together with the fitted 
overall regression line in such an estimation, as long as condition ([7]) holds. 

Different treatment-specific covariate distributions within the same interval or different intervals might 
also cause treatment-specific covariate means to be different. In such a case, ANCOVA should be preferred 
against the methods on covariate-adjusted residuals. 

In conclusion, we recommend the following strategy for the use of the above methods: (i) First, one should 
check the significance of the effect of the covariates for each treatment, i.e., test : "all treatment-specific 
slopes are equal to zero" . If HI is not rejected, then the usual (one-way) ANOVA or K-W test can be used, 
(ii) If HI is rejected, the covariate effect is significant for at least one treatment factor. Hence one should test 
i?" : "equality of all treatment-specific slopes" . If H" is rejected, then the covariate should be included in the 
analysis as an important variable and the usual regression tools can be employed, (iii) If H" is not rejected, 
check the covariate ranges. If they are similar or have a considerable intersection for treatment factors, then 
ANCOVA and methods on residuals are appropriate. Then one should check the underlying assumptions for 
the methods and then pick the best method among them, (iv) If covariate ranges are very different, then it 



is very likely that treatment and covariate are not independent, hence ANCOVA is not appropriate. On the 
other hand, the methods on residuals can be used but they are extremely conservative. In this case, one may 
apply some other method, e.g., MANOVA on (response, covariate) data for treatment differences. 
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6 Tables 





error term 


sample sizes 


ranges of covariate for 


case 


ind 


ind 


ni 




treatment 1 


treatment 2 


1 


7V(0,1) 


N{Q,l) 


20 


20 


(0,10) 


(0,10) 


2 


7V(0,1) 


iV(0,6) 


20 


20 


(0,10) 


(0,10) 


3 


7V(0,1) 




20 


20 


(0,10) 


(0,10) 


4 


7V(0,^/i) 




20 


20 


(0,10) 


(0,10) 


5 


N(Q,l) 


N{Q,l) 


28 


12 


(0,10) 


(0,10) 


6 


N{Q,l) 


N{Q,l) 


20 


20 


(0,6) 


(4,10) 


7 


7V(0,1) 


N{Q,l) 


20 


20 


(0,3)U(7,10) 


(4,10) 


8 


7V(0,1) 


N{Q,l) 


20 


20 


(0,4)U(6,10) 


(3,7) 


9 


U (-V3, V3) 


U (-V3, V3) 


20 


20 


(0,10) 


(0,10) 


10 


U (-V3, V3) 


U (-2V3,2V3) 


20 


20 


(0,10) 


(0,10) 


11 


u (-\/3, vi) 


U {-\fx, y/x) 


20 


20 


(0,10) 


(0,10) 


12 


DW{Q,l,i) 


DW (0,1,3) 


20 


20 


(0,10) 


(0,10) 


13 


V48 {13 (6, 2) - 3/4) 


V48 (/? (6, 2) - 3/4) 


20 


20 


(0,10) 


(0,10) 


14 






20 


20 


(0,10) 


(0,10) 


15 


L7V(0, 1) - 


LN{Q, 1) - ei/2 


20 


20 


(0,10) 


(0,10) 


16 


7V(0,2) 




20 


20 


(0,10) 


(0,10) 



Table 1: The simulated cases for the comparison of ANCOVA and methods on covariate-adjusted residuals. 

Cijk'. error term; independently distributed as; n^: sample size for treatment level i = 1,2. N (^fi,a^^ 
is the normal distribution with mean /i and variance cr^; U {a,b) is the uniform distribution with support 
(a, b); DW (a, b, c) is the double Weibull distribution with location parameter a, scale parameter b, and shape 
parameter c; /3 (a, b) is the Beta distribution with shape parameters a and b; X2 is the chi-square distribution 
with 2 df LN{a, b) is the log-normal distribution with location parameter a and scale parameter b. 





empirical sizes 


size comparison 


Case 


Si 


0.2 




a4 


(1,2) 


(1,3) 


(1,4) 


(2,3) 


(2,4) 


(3,4) 


la 


.0531 


.0541*^ 


.0540" 


.0532 














lb 


.0507 


.0493 


.0493 


.0510 




Pi 


?a 




pa 


pa 


2a 


.0581" 


.0576" 


.0546" 


.0612" 




Pi 


< 




< 


< 


2b 


.0531 


.0515 


.0493 


.0630" 






< 




< 


< 


3a 


.0606" 


.0602" 


.0567" 


.0693" 






pa 




pa 


< 


4a 


.0523 


.0525 


.0519 


.0511 






pa 




pa 


pa 


5a 


.0490 


.0496 


.0499 


.0502 


pa 




pa 




pa 


pa 


6a 


.0556" 


.0024^= 


.0024^= 


.0033= 


> 


> 


> 




pa 


pa 


7a 


.0465 


.0339^= 


.0337^= 


.0332= 


> 


> 


> 




pa 


pa 


8a 


.0474 


.0437= 


.0433^= 


.0440= 






pa 


pa 


pa 


pa 


9a 


.0485 


.0489 


.0484 


.0488 














10a 


.0508 


.0505 


.0490 


.0595" 












< 


11a 


.0522 


.0515 


.0511 


.0576" 






< 




< 


< 


12a 


.0490 


.0494 


.0492 


.0491 














13a 


.0486 


.0481 


.0480 


.0473 














14a 


.0442'= 


.0435^= 


.0417^= 


.0451= 














15a 


.0383'= 


.0386^= 


.0357^= 


.0521 






< 




< 


< 


16a 


.0510 


.0514 


.0502 


.0701" 






< 




< 


< 



Tabic 2: The empirical sizes and size comparisons of ANCOVA and methods on covariate-adjusted residuals 
for the 16 cases listed in Table [T] based on 10000 Monte Carlo samples: a^: empirical size of method i; 
empirical size comparison of method i versus method j for i,j = 1,2,3,4 with i ^ j where method i — 1 
is for ANCOVA, i ~ 2 and i = 3 are for ANOVA with and without HOV on covariate-adjusted residuals, 
respectively, i = 4 is for K-W test covariate-adjusted residuals. ^( =): Empirical size is significantly larger 
(smaller) than 0.05; i.e., method is liberal (conservative), pa: Empirical sizes are not significantly different 
from each other; i.e., methods do not differ in size. < (>): Empirical size of the first method is significantly 
smaller (larger) than the second. 





Proportion of agreement 


Case 


ai,2 


"1,3 


Sl,4 


"2,3 


"2,4 


"3,4 


la 


.0520" 


.0519" 


.0429" 


.0540" 


.0432" 


.0431" 


lb 


.0490" 


.0490" 


.0415" 


.0493" 


.0413" 


.0413" 


2a 


.0560" 


.0545" 


.0419" 


.0546" 


.0415" 


.0405" 


2b 


.0513" 


.0493" 


.0383" 


.0493" 


.0377" 


.0369" 


3a 


.0581" 


.0565" 


.0468" 


.0567" 


.0469" 


.0453" 


4a 


.0507" 


.0505" 


.0382" 


.0519" 


.0380" 


.0378" 


5a 


.0473" 


.0389" 


.0382" 


.0396" 


.0392" 


.0388" 


6a 


.0024" 


.0024" 


.0033" 


.0024" 


.0015" 


.0015" 


7a 


.0338" 


.0336" 


.0286" 


.0337" 


.0260" 


.0260" 


8a 


.0426" 


.0423" 


.0346" 


.0433" 


.0340" 


.0338" 


9a 


.0475" 


.0473" 


.0417" 


.0484" 


.0422" 


.0420* 


10a 


.0498" 


.0488" 


.0420" 


.0490" 


.0421" 


.0412" 


11a 


.0507" 


.0504" 


.0456" 


.0511" 


.0457" 


.0454" 


12a 


.0477" 


.0476" 


.0378" 


.0492" 


.0383" 


.0383" 


13a 


.0476" 


.0476" 


.0369" 


.0480" 


.0371" 


.0371" 


14a 


.0425" 


.0412" 


.0274" 


.0417" 


.0275" 


.0272" 


15a 


.0367" 


.0355" 


.0253" 


.0357" 


.0252" 


.0246" 


16a 


.0497" 


.0493" 


.0394" 


.0502" 


.0392" 


.0389" 



Table 3: The proportion of agreement values for pairs of methods in rejecting the null hypothesis for the 16 
cases listed in Table [1] based on 10000 Monte Carlo samples: ciij: proportion of agreement between method 
i and method j in rejecting the null hypothesis for i,j = 1,2,3,4 with i 7^ j where method labeling is as in 
TableO ": Proportion of agreement, atj, is not significantly different from the minimum of Si and aj. ": 
Proportion of agreement. Sij , is significantly smaller than the minimum of S; and Sj. 





cases 




la; lb 


2a; 2b 


3a; 3b 


4a; 4b 


5a; 5b 


6a; 6b 


7a; 7b 


8a 


8b 




1.82; 1.30 


3.58; 2.38 


3.34; 2.38 


4.06; 3.06 


1.98; 1.42 


2.96; 2.20 


2.02; 1.40 


1.98 


1.32 


K2 


1.82; 1.34 


3.58; 2.50 


3.34; 2.38 


4.40; 3.06 


2.06; 1.42 


5.36; 3.30 


2.28; 1.58 


2.02 


1.40 


K3 


1.82; 1.34 


3.58; 2.50 


3.34; 2.38 


4.40; 3.06 


2.06; 1.42 


5.36; 3.30 


2.28; 1.58 


2.02 


1.40 


K4 


1.90; 1.36 


3.80; 2.60 


3.36; 2.38 


4.38; 2.92 


2.06; 1.46 


7.04; 3.58 


2.70; 1.76 


2.04 


1.46 




cases 




9a 


9b 


10a; 10b 


11a; lib 


12a; 12b 


13a 


13b 


14a; 14b 


15a 


15b 


16a; 16b 


Kl 


1.80 


1.30 


2.74; 1.98 


2.06; 1.44 


1.74; 1.18 


1.86 


1.22 


4.46; 2.78 


9.86 


5.58 


338; 2.34 


K2 


1.80 


1.30 


2.74; 1.98 


2.06; 1.44 


1.74; 1.26 


1.86 


1.22 


4.46; 2.78 


9.86 


5.58 


3.42; 2.34 


K3 


1.80 


1.30 


2.74; 1.98 


2.06; 1.44 


1.74; 1.26 


1.86 


1.22 


4.46; 2.78 


9.86 


5.58 


3.42; 2.34 


K4 


2.02 


1.52 


3.20; 2.34 


2.34; 1.72 


2.02; 1.60 


1.98 


1.32 


4.10; 2.26 


3.66 


1.90 


3.62; 2.64 



Table 4: The intercept difference values at which the power estimates reach 1 for the 16 cases listed in Table 
[T] based on 10000 Monte Carlo samples: Ki= intercept difference value at which power estimate of method i 
reaches 1 for the first time for i = 1, 2, 3, 4 where method labeling is as in Tabled 



7 Figures 
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Figure 1: A sample plot for case 6, where observations from treatment i are marked with i, for i = 1, 2, trt 
i~ fitted regression line for treatment i ~ 1, 2; overall^ overall fitted regression line. 



4^ 
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Figure 2: A sample plot for case 7. Labeling is as in Figure [T] 




Figure 3: Empirical sizes for ANCOVA and ANOVA on covariate-adjustcd residuals versus the distance 
between the treatment-specific means, d = Xi — X2., with the corresponding 95% confidence bands. 
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Figure 4: Empirical power estimates versus intercept difference for cases la and lb. 
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Figure 6: Empirical power estimates versus intercept difFerence for cases 5a-10a. 



Case 11a 



Case 12a 



CD 

as 
o 

Q. d 
E 

CD 



ANCOVA 
ANOVA with HOV 
ANOVA without HOV 
K-W test 



O 



as 
o 




ANCOVA 
ANOVA with HOV 
ANOVA without HOV 
K-W test 



1 2 3 

intercept difference 



1 2 3 

intercept difference 



Case 1 3a 



Case 1 4a 



CD 

1-° 

as 
o 

Q- d 

E 

CD 




ANCOVA 
ANOVA with HOV 
ANOVA without HOV 
K-W test 



CD 

o 
o 




ANCOVA 
ANOVA with HOV 
ANOVA without HOV 
K-W test 



1 2 3 

intercept difference 



1 2 3 

intercept difference 



Case 1 5a 



Case 1 6a 



as 
o 

Q- d 

E 

CD 

CM 




ANCOVA 
ANOVA with HOV 
ANOVA without HOV 
K-W test 



CD 

o 
o 

Q. 

E 

CD 




ANCOVA 
ANOVA with HOV 
ANOVA without HOV 
K-W test 



1 2 3 

intercept difference 



1 2 3 

intercept difference 



Figure 7: Empirical power estimates versus intercept difference for cases lla-16a. 



